Dynamic Policy Programming with Function Approximation
نویسندگان
چکیده
In this paper, we consider the problem of planning in the infinite-horizon discountedreward Markov decision problems. We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellmanlike iteration. For discrete state-action case, we establish L∞-norm loss bounds for the performance of the policy induced by DPP and prove that it asymptotically converges to the optimal policy. Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique. We provide L∞norm performance-loss bounds for approximate DPP and compare these bounds with the standard results from approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods. We also numerically compare the performance of DPP to other ADP and RL methods. We observe that approximate DPP asymptotically outperforms other methods on the mountain-car problem.
منابع مشابه
Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method
In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...
متن کاملA Multi-Stage Single-Machine Replacement Strategy Using Stochastic Dynamic Programming
In this paper, the single machine replacement problem is being modeled into the frameworks of stochastic dynamic programming and control threshold policy, where some properties of the optimal values of the control thresholds are derived. Using these properties and by minimizing a cost function, the optimal values of two control thresholds for the time between productions of two successive nonco...
متن کاملAn Optimal Tax Relief Policy with Aligning Markov Chain and Dynamic Programming Approach
Abstract In this paper, Markov chain and dynamic programming were used to represent a suitable pattern for tax relief and tax evasion decrease based on tax earnings in Iran from 2005 to 2009. Results, by applying this model, showed that tax evasion were 6714 billion Rials**. With 4% relief to tax payers and by calculating present value of the received tax, it was reduced to 3108 billion Rials. ...
متن کاملAn approximation algorithm and FPTAS for Tardy/Lost minimization with common due dates on a single machine
This paper addresses the Tardy/Lost penalty minimization with common due dates on a single machine. According to this performance measure, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Initially, we present a 2-approximation algorithm and examine its worst case ratio bound. Then, a pseudo-polynomial dynamic programming algorithm is de...
متن کاملTuning Approximate Dynamic Programming Policies for Ambulance Redeployment via Direct Search
In this paper we consider approximate dynamic programming methods for ambulance redeployment. We first demonstrate through simple examples how typical value function fitting techniques, such as approximate policy iteration and linear programming, may not be able to locate a high-quality policy even when the value function approximation architecture is rich enough to provide the optimal policy. ...
متن کاملEstimation and accuracy of the stationary solution in the dynamic programming problem: New results
In this paper we give explicit error bounds for approximations of the optimal policy function in the stochastic dynamic programming problem. The approximated policy function is obtained by using the Bellman equation with an approximated value function and the error bounds depend on the primitive data of the problem. Neither differentiability of the return function nor interiority of solutions i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011